





1^- * 



wk’ 









I''- 






fc* V' 



L IJIa ■!IIJ'iiaj^Uay^rw..ur 









REPORT 



RESUMES 




ED 017 006 



24 CS OOS 556 

EFFECT OF ERROR OF MEASUREMENT ON THE POWER OF STATISTICAL 
TESTS. FINAL REPORT. 

BY> CLEARY* T. A. LINN* ROBERT L. 

EDUCATIONAL TESTING SERVICE* PRINCETON* N.J. 

REPORT NUMBER BR-6-6574 PUB BATE SEP 67 

GRANT OEG-1-7-06874-2632 

EDRS PRICE MF-S0.25 HC'$2.08 SOP. 



.1 






■| 



DESCRIPTORS' 4STATISTICAL ANALYSIS* MENTAL TESTS* 
♦RELIABILITY) TEST CONSTRUCTION* »TESTS OF SIGNIFICANCE* 
ANALYSIS OF VARIANCE* 4MEASUREMENT TECHNIOUES* 







Wf * - • * Ts '■ •• 



THE PUR/'OSE OF THIS RESEARCH WAS TO STUDY THE EFFECT OF 
ERROR OF MEASUREMENT UPON THE POWER OF STATISTICAL TESTS. 
ATTENTION WAS FOCUSED ON THE F-TEST OF THE SINGLE FACTOR 
ANALYSIS OF VARIANCE. FORMULAS WERE DERIVED TO SHOW THE 
RELATIONSHIP BETWEEN THE NONCENTRALITY PARAMETERS FOR 
ANALYSES Uf^NG TRUE SCORES AND THOSE USING OBSERVED SCORES. 
THE EFFECT OF THE RELIABILITY OF THE MEASUREMENT AND THE 
SAMPLE Sn/E WERE THUS DEMONSTRATED. THE ASSUMPTIONS OF 
CLASSICAL TEST THEORY WERE USED TO DEVELOP FORMULAS RELATING 
TEST LENGTH TO THE NONCENTRALITY PARAMETERS. THREE METHODS OF 
ESTIMAT^.NG POWER FOR DIFFERENT CONDITIONS OF SAMPLE SIZE AND 
TEST LENGTH WERE STUDIED. THE COST OF AN EXPERIMENT WAS 
ANALYZED IN TERMS OF A FIXED COST PER SUBJECT AND A VARIABLE 
COST DEPENDENT UPON TEST LENGTH. COMPUTER PROGRAMS WERE 
WRITTEN TO USE THE LEAST SftUARES APPROXIMATION AND THE 
APPROXIMATION BASED ON PATNAIK TO ESTIMATE THE POWER UNDER 
ALL PERMISSIBLE ALLOCATIONS OF RESOURCES TO SAMPLE SIZE AND 
TEST LENGTH. THE PROGRAM RESULTS INDICATE WHICH OF THE 
PERMISSIBLE ALLOCATIONS WILL RESULT IN MAXIMUM POWER. TO 
DEMONSTRATE EMPIRICALLY THE EFFECT OF ERROR OF MEASUREMENT ON 
THE POWER OF STATISTICAL TESTS* SAMPLES OF PERSONS AND ITEMS 
WERE RANDOMLY DRAWN FROM A LARGE POOL OF BATA. TESTS OF 10* 
SO* AND 40 RANDOMLY DRAWN ITEMS WERE SCORED FOR SAMPLES WITH 
FOUR AND EIGHT PERSONS PER GROUP. THE EXPECTED TRENDS WERE 
PRESENT BUT NOT DEFINITIVE. (AUTHOR) 


















-■■-M 












A- 






I**: 



rir ■ * 



HV' \ J • 



' ‘ ^ 












», *■' < *■'/ '• ■* 




<*•> • 


















Ui 



p«“ 



? 









ERIC 



final report k vJE 

Project No. 6-85T^'^<^ ' 
Contract No. 0EG-l-7-0685T^2632 



effect of error of measurement on 

THE POWER OF STATISTICAL TESTS 



September 1967 



U.S. EEPARTMENT OF HEALTH, 
EDUCATION, AND WELFARE 










- ^ 

j 


ii. 


j 

i \o 




in 




m 








o 




o 




1 o 




! O 



Office of Education 
Bureau of Research 



1 
















Effect of Error of Measurement on the 
Power of Statistical Tests 



Project Noo 6-^57^ 

Contract No* OEG— l'“7"'0^5T^2632 

T. Anne Cleary*and Robert L. Linn 

September I 967 



The research reported herein was performed pursuant to 
a grant with the Office of Education, U.S. Department 
of Healtti, Education, and Welfare. Contractors under- 
taking such projects under Government sponsorship 
are encouraged to express freely their professional 
judgment in. the conduct of tlie project. Points of 
view or opinions stated do not, therefore, necessarily 
represent official Office of Education position or 
policy. 



Educational Testing Service 



Princeton, New Jersey 



*The services of Dr. Cleary were subcontracted with 
the University of Wisconsin. 




Contents 



9 *' 






r 



Introduction ^ 

Problem i!!!!!!!!!! 1 

Purpose *!!!!!!!!! 2 

^z*t I: Theoretical Development „ p 

Test Theory I *,][[* I p 

Statistical Tests !!!!!!*! ii 

The Power Function «>•.*..,.!!!!! 3 

Cost of an Experiment !!!!!! 15 

Allocation of Resources ng 

Conclusions !!!!!!! 10 

Part II: Bnpirical Demonstration pi 

Pu2?pose 

Method *• *• *• *• I *. I ,][ * * 21 

Results o .[[[[][,[][[ [ 22 

Discussion «....!!!!!!!!!! 2^ 

Conclusions !!!!!!! 28 

Summary 2g 

References . . 

30 

Appendix A a-1 

Appendix 3 



iiERlC. 



-ii- 




















Introduction 



Problem 



Discussions of the power of statistical tests can be found in 
almost ft-lT basic statistics books* The power function, which gives 
the probability of rejecting a hypothesis, depends upon the dif- 
ferences expected in random samjd.es from the same j)opulation, that 
is, upon sampling error. Imjaicit in the usual discussion of 
power is the assumption that the observations are errorless or 
’’true" measurements. Sampling error rather than measurement error 
is considered. 

The test theory literature, on the other hand, is concerned 
primarily with the error of measurement (4) , Observations are 
considered fallible and repeated measures of the same object are 
expected to vary about the "true" measurement, the expected value 
of the repeated measures. 

Sutcliffe (10) has attempted to consider the two types of 
error simultaneously. Sutcliffe elaborated the implications of 
measurement error for iiie F test of differences between means and 
demor^trates how measurement error decreases the sensitivity of a 
test of significance. More specifically, Sutcliffe compared the 
ratios of the expected mean square between groups to the expected 
mean square within groups for a single factor analysis of vari- 
ance in two cases: the case of no measurement error and the case 

where obseived scores were assumed to include measurement error 
as defined in classical test theory. Sutcliffe showed that the 
power of the test is always greater- for the error-free case. 

Lord (6) has given extensive consideration to the implica- 
tions of an item sampling model for- mental test theory. Lord has 
shown that item sampling methods cen improve the efficiency of the 
experimental design of a study par1;icularly one concerned with 
group means. 

(i) If only a limited ajnount of time can be demanded 
of each research subject, the total amount of infor- 
mation obtained from a given number of subjects may 
be greatly increased by item sampling, (ii) If a 
test can be administered to only one examinee at a 
time, the examiner’s time may be the limiting factor; 
more information about a group of examinees may be 
obtained by giving a few items to each examinee in- 
stead of giving the entire test to Just a few examin- 
ees. (iii) With certain tests, scoring costs may be 
the limiting factor; in this case, it would be better 
to score a few items from the answer sheet of each 
examinee than to score all items on the answer sheets 
of a few examinees. (6, p. 23) 



The item-sampling model has strong advantages in many ^°“P- 
oomparison situations such as frequently occur in the evaluations 
of Sucational programs. Hcwever, practical ad^istrative con- 
siderations such as the need for common instructions and testing 
time, the economy of being able to use a single scoring key, ^d 
the fact that test data must frequently serve several 
often make it desirable to administer the same test to aU examinees, 
In such situations, one is faced with the problem of decidi^ 
whether it is more efficient to improve the sensitivity of a 
I'lar.r'gfl Statistical test by increasing the nuniber of eaiminees or 
by increasing the test length as a means of reducing the error of 

measiirement « 

OveiaU and Dalai (7) discussed the problem of 
research design which maximizes power relative to cost, ^y 
concluded that no matter, how unreliable the measurement, it is 
better to use more subjects and obtain a single measurement 
subject than to obtain several measures on each of fewer subjects. 

As derail and Dalai pointed out, the above conclusion is based 
on the assumption that there is a fixed cost per measur^ent 
unit, and this cost is the same whether the units are obtained 
for the same subject or different subjects. 



Purpose 

The purpose of this research was to develop, from the assum^ 
tions of classical test theory, formulas demonstrates the effect 
of error of measurement on. the power of some commonly ^ed staMs- 
tical. tests. An important aspect of the research was the develop- 
ment of a procedure that would enable the educational research^ 
to estimate whether an attempt to reduce measurement error by 
creasing accuracy of observations or to reduce sampling 
increasing the number of observations would be the more effective 
strategy. The implications that various assumption 
the ftod and variable costs of testing have for the ^ 

strategy were investigated, also. Since the assumptions of c^^^- 
cal theory cannot be expected to hold esra-ctly in ^ . 

effects on statistical tests of increasing and the 

number of observations were demonstrated empirically. 



Part 1: Theoretical Development 



Test Theory 

In classical test theory, it is assumed that an observation, 
Xi , for individual i is equal to his true score, T^ , plus 
an error score, E. : 






( 1 ) 





> 



where the expected value of E equals zero (e(E) = O) , the vari- 
ance of E equals , eind the covariance of T with E , , 

equals zero (k). 

Given these assumptions, it can he shown that: 

(2) e(x)=e(T) 



and 



(3) 



4 



2 






2 2 
where is the variance of X emd is the variemce of T .. 

If p is the reliability of measurement X , then the variGuice 
of the error can be written: 






_2 _2 _x _2 (1-p) 

""E . 



If a test , is lengthened by combining K unit-length parallel 
tests, the relationships between the parameters of the tmit length 
test 8uid those of the lengthened test are well known: 

(5) 4 



( 6 ) 

(T) 




= Ck + K(K-l) P^ 





and 




0 , 

~ 1 + (K-1) 



where the subscript K denotes the lengthened test and . the sub- 
script 1 denotes the \mit length test. Prom the above formulas, 
it is apparent that, if K is larger than one, the three variances 
increase with K : the increase is greatest for the varieulce of 

the true scores, least for the variance of the eirror, The change 
in the relative sizes of the variances is reflected in. the change 
in the reliability: as K increases, the reliability Increases, 



- 3 - 




Statistical Tests 

In the derivation and interpretation of statistical tests, the 
observations are generally considered to be free of error 
surement, that is, in the language of test theo^, the obse^tto^ 
are true scores. The application of statistical tests to o^se^ed 
scores subject to error of measurement is in no 

even neces^riljr inappropriate: the assumptions of the stotistical 

tests may be satisfied by the observed scores. However, ^ the 
hypotheses are formulated in terms of true scores ^d tested with 
observed scores, the noncentrality parameter and- therefore We 
nower can be quite different from what would be expected wi-W true 
Lores. Failure to reject the nuU hypothesis with observ^ scores 
is not equivalent to a failure to reject the null hypothesis with 
true scores. 

Perhaps, one of the most commonly xised statistical tests in 
educational research is the F test of the am^sis of 
In addition to being commonly used, it is well known t^t the F 
test with one and Vg degrees of freedom is e^^valent to ^e 
1:wo-.tailed t test. If vg approa^es infinity, the F distri- 
bution approaches a chi-square distribution. 



Consider a single— factor analysis of variance, 
this analysis is 



The model for 



( 9 ) 






S = 



G 



i = 1, 



n 



where 

Tig is the true score for individual i in group g y 
M ^ is the population true-score mean, . a. 4 .v 

A is the component of the true score which is due to the 

® effect of treatment g , and 
Ejff is the deviation of an individml's score from the group 
mean, the error of analysis-of-vafiance model. 

The B assumed to be independently and noi m ta l ly distri 

buted with Spected value of zero and common variance , Over 
all possible treatments, g , the sum of the is zero and the 

variance is a? . Table 1 presents the expected mean squares for 

A 

this model*. 






O 

ERIC 






M 



C SS^BIFfflSfBBJ B m ■Wi'M 



TABLE 1 

Expected Mean Squares for a Single-Factor 
Analysis of Variance of True Scores 



« 


Source 


Degrees of Freedom 


e(MS) 


♦ 


Between 


G**l 


“ 4 + °E 




Within 


G(n-l) 


, 4 




Total 


Gn-1 





If the null hypothesis of no difference between treatments 
(g| _ Q) is true, the test statistic (the mtio of the mean square 

between groups to the mean square within groups) is distributed as 
F with (G-1) and G(n-l) degrees of freedom. If the null hy- 
pothesis is not true, the test statistic is distributed as a non- 
central F with the same degrees of freedom and noncentrality 
parameter: 



( 10 ) 



na, 



T 



B 



If observed scores leather than true scores are used in the 
analysis, the model is 



( 11 ) 



X. = M + A + B. + E. 
ig g ig ig 



where 






X^cf is the observed score for individtial i in group g , 

is the measurement error for individual i in group g , 

and 

and B. are the same as in the true score model, 
ig 



M 






Within each group g , the measurement error, , is 



assumed to have a normal distribution with expected value of zero 
and variance, a| , The expected mean squares for this analysis 

are shown in Table 2, 

- 5 -. 













. 

ERIC 



TABLE 2 



Expected Mean Squares for a Single-Factor 
Analysis of Variance of Observed Scores 



So\irce 


Degrees of Freedom 


e(MS) 






2 2,2 


Between 


G-1 


" “a + ‘’b + °E 


Within 


G(n-l) 


2 , 2 
‘^B + '^E 


Total 


Gn-1 





If the null hypothesis (o. = O) is true, the test statistic 



has the same distribution as in the error-free case. Ucwever, if 
the null hypothesis is false, the test statistic is distributed as 
noneentral F with the same degrees of freedom but with noncentral- 
ity parameter, 

2 



( 12 ) 



na 



A 



X 



2 2 



A , will be smaller than the non- 



Since 



For greater than zero, the noncentrality parameter for 
the obsei’ved score analysis, 

centrality parameter for the true score analysis, A^ 

power for the test with given degrees of freedom is a nondec^^easing 
function of the noncentrality parameter, the power for the true- 
score analysis is always greater than the power for the observed 
score analysis. 



For fixed n , the relationship between power and error of 



measurement can be seen by noting that the ratio of the mean 
squares divided by their expected values. 



MS. 



Between 



2 2 



2 2 2 
“‘’a + °b + 



®Withln 

is distributed as Central F . Power can then be expressed as 



- 6 - 















X:. A 







Pr 



{ 



"MS 

Between 

Within 




this result was obtained previously by Sutcliffe^(io^*^ earlier 

s as2fS‘car iTJiS 4rJi 2 gss s«* 

SS‘S s„‘3 



of P thrrSi^u??v®^fff^"’ ^ usefully expressed in terns 
true score components. From fibulas h of the 



(13) 



A._ = 



pn o. 



+ (If) o^ 



since 



(14) 



^22 
"t = <^A + • 



«i,= relationship between the noncentrality naiameters for tha 
fo™ST 10 seen by substituting 



(15) 



1 “P\p 

''x-^7 (l-p)-A^ • 



For fixed , and n , is a positively accelerated function 

i ^®lia-l3ility of the scores: as o increases hv pmipi 

units froia zero, the increase in is at first quite smU^t 

S2s~T ^ “ ^liShtly larger in- 

A^ . !Ehe rate of positive acceleration increases to 7\^ 

X 

served scores nonoentrality parameter can be expressed: 




( 16 ) 



4 



9 

It 



= nKPj^ + ,(1^^) + n) 



The effect of n and K on the noncentrality parameter, , 
can he seen more clearly if equation l6 is expressed in terms of 



where 



(IT) 



_IA . 

B 



Thus, = n<(>y 



and: 



( 18 ) 



^X = 






Kp^ + (X-p) {4 + 1) 



The noncentrality parameter, X.^ , is a strictly increasing' 

function of both K and n • However, the effect of increasing 
n is relatively greater than Idle effect of increasing K since 
K influences both the numerator, and the denominator whereas n 
influences only the numerator » In addition, the effect of n upon 
power is increased by the change in degrees of freedom. 



The Power Function 



The power function for a statistical test gives the probability 
that the null hypothesis will be rejected given different alterna- 
tive values of the parameter. To determine the power of the P 
test of the analysis of variance, one needs to determine the propor- 
tion of the area of the noncential F distribution that falls in 
the critical region. In the single factor analysis of -variance. 



the test statistic, F , is: 



( 19 ) 



MS. 



F = 



Between 



° Within 



and the critical region is defined by 



F > F^ 
o a 



where OL is the significance level of the test. The pcwer function 
for a given K is then given by 



00 



( 20 ) 



Power = / o (P'|\) ap' 

ii 12 



a 



- 8 - 






















m 




where is the percentage point of the F distribution with 

degrees of freedom ^ ^ , v ^ and the integration is over the den- 
sity function of the noncentral F distribution, F' , with 
and Vg degrees of freedom and noncentrality parameter \ , 

The evaluation of the power function is not simple. Methods 
of evaluating the probability integral have been worked out by 
Wishart (12) and Tang (ll), but the amount of labor involved gen- 
erally limits consideration to a few alternative hypotheses. Several 
authors have presented power function curves (2, 3, 3, 8, 9). 

These curves enable one to determine quickly, if approximately, the 
power for a limited number of sets of degrees of freedom and non- 
centrality parameters. The most relevant of these charts for the 
design of experiments are those of Feldt and Mahmoud (2) which 
present curves of constant power, for power equal to .5, .9^ 

as a function of n , the number of persons per cell, and > 
the noncentrality parameter. The charts are designed to permit the 
specification of sample size for the testing of main effects in 
the analysis of variance. The limited number of power curves 
restricts the use of the charts to situations in which only a rough 
estimate of power is required. 

Overall and Dalai (7) proposed a method of approximating the 
power of an F test which is very appealing because^of its great 
simplicity. Their approximation can be denoted as F / Fc^ where 
Ip is the ratio of the expected mean square between to the expected 
mean square within and F^ is the critical value of the F ^^^tio 
with a significance level of a . It should be noted that F is 
not the same as the expected of F since in general the expected 
value of a ratio is not equal to the ratio of the expected values. 
Nevertiieless, F / Fq; is very simple to compute and can be readily 
expressed in terms of the noncentrality parameter \ since 

(21) F = 1 + \ . 

Overall and Dalai have shown that for a particular example the 
: 3 ratio ^ / F^ has a good linear relationship with the true power 

1( correlation equal .988) for a range of true power between .10 and 
.60. They concluded that f' / Fa is a good index of power which 
*’ . . . should provide an adequate basis for comparing alternative 
permissible experiments." (7, p. 3^9). However, for values of 
the true power less than .10 or greater than .80 the linear fit is 
not very good. For example, the correlation between true power and 
f / Fq^ for = 2 , V ^ = 2, 3, 3, 6, 8, 10, 12, l8, 30, and 

60, and 0 , 3, 4, 5, 6, 8, 10, 12, and l8 is .966 which repre- 
sents a fit that is considerably less adequate than the one repre- 
sented by the correlation of .988 reported in the example by 
Overall and Dalai. 






...’Xu 









The tabled values of power given by Overall and Dalai (7) "the 
calculated values of f presented in columns three and 

six respectively of Table 3* , _ 

Use of the index F / in place of the true power can lead 
to erroneous conclusions about the best allocation of resources. 
However, the errors will not be serious stoce an allocation of re- 
sources, which yields an optimal value of x will yield a true 

power which will be among the hipest possible althou^^it may not 
be the absolute maximum. In general, F / F^ appears to be a 
useful index: it is easy to calculate and provides a reasonable 

approximation to power. 

The index F / F^g does have two minor disadvantages: the 

obtained values do not have the same scale as power, so the un- 
modified index does not indicate the actual power levels the index 
requires only simple hand calculations, but the calculations arc 
based on the tabled values of Fq; , so the procedure is not well 
suited to the computer. 

Patnaik (8) has developed an approximation to the noncentral 
F which fits to the noncentral F , F* , a central F distri- 
bution with the same first two moments : 

_F* 

(22) 



where 

(23) 



and 



The accuracy of the approximtion appears to be quite good. 

For those values of power for which Patnaik compares his approxima- 
tion to the Tong (ll) tables , the approximation is generally ac- 
curate to two decimal places and the error in the third decimal 
place appears to be small near the tails. Patnaik* s approximation 
is useful only to the extent that it is possible to evaluate the 
integral of the appropriate central F distribution. A computer 
program written by Holloway and Capp provides one method of evalu- 
ating the central F integral. This program is presented in 
Appendix A as Subroutine FDIST, 



J Pv^v^ (5“ I = j Pi/^ 



(F) dF 



u) = 



Vi + ^ 



I 



V = 



V^ + 2X 






i 












6 . 






0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 



3 



3 

3 

3 

3 

3 

3 

3 

3 

3 



k 

k 

k 

k 

k 

k 

k 

k 

k 

k 

k 



^ T, s.t'^-c-n *' 

' * 5X ' 






Table 3 



Comparisoji of Me'thods of Est/imalJiiig Power for 



I o 

ERIC 







Overall 
&. Dalai 
Power 



v^ = 2 



Patnaik 

Approximation 



Curve- 

Fitting 

Estimate 









F/F 



a 



2 


.05 


.05 


-.04 


.05 


3 


.05 


.05 


-.02 


.10 


k 


.05 


.05 


-.00 


.14 


5 


.05 


.05 


.02 


.17 


6 


.05 


.05 


.04 


.20 


8 


.05 


.05 


.06 


.22 


10 


.05 


.05 


.08 


.24 


12 


.05 


.05 


.10 


.26 


18 


.05 


.05 


.13 


.28 


30 


.05 


.05 


.16 


.30 


60 


.05 


.05 


.15 


.32 


.2 


.12 


.12 


.13 


.20 


3 


.15 


.15 


.17 


.42 


.4 


.18 


.18 


.19 


.58 


5 


.19 


.19 


.21 


.70 


6 


.21 


.21 


.23 


.78 


8 


.23 


.23 


.25 


.90 


10 


.24 


.24 


.27 


.98 


12 


.25 


.25 


.29 


• 1.03 


18 


.27 


.27 


.31 


1.13 


30 


.28 


.28 


.34 


1.20 


60 


.30 


.30 


.32 


1.27 


2 


.14 


.14 


.17 


.26 


3 


.19 


.18 


,21 


.52 


4 


.22 


.22 


.24 


.72 


5 


.24 


.24 


.26 


.86 


6 


.26 


.26 


.28 


.98 


8 


.30 


.29 


.31 


1.12 


10 


.31 


.31 


.33 


1.22 


12 


.33 


.33 


.34 


1.28 


X 8 


.35 


.35 


.37 


1.4 l 


30 


.37 


.37 


.39 


1.50 


60 


.39 


.39 


.38 


1.58 






-Ur 

















Table 3 (Cont’d) 



5 

5 

5 

5 

5 

5 

5 

5 

5 

5 

5 

6 

6 

6 

6 

6 

6 

6 

6 

6 

6 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 



'’2 


Overall 
& Dalai 
Fewer 


Batnaik 

Approxisjiation 


Curve - 

Fitting 

Estimate 


A 

f/f 
' a 


2 


.16 


.16 


.20 


.31 


3 


.22 


.22 


.25 


.63 


k 


.26 


.26 


.28 


.86 


5 


.29 


.30 


.31 


lo 04 


6 


.32 


.32 


.33 


1.17 


8 


.36 


.36 


.36 


• 1.34 


10 


.38 


.38 


.38 


1.46 


12 


.40 


.40 


.40 


1.54 


18 


.43 


.43 


. . 43 


1.69 


30 


.45 


.45 


.45 


1.81 


60 


. 4 T 


.47 


.43 


1.90 


2 


.18 


.18 


.22 


.36 


3 


.25 


.25 


.28 


.74 


k 


.31 


.31 


.32 


1.01 


5 


.35 


.35 


.35 


1.21 


6 


.38 


.38 


.37 


1.36 


8 


.42 


.42 


.41 


1 . 5 t 


10 


.45 


.45 


.43 


1.71 


12 


. 4 T 


.47 


.45 


1.80 


18 


.51 


.51 


.48 


1.97 


30 


.54 


.54 


.51 


2.11 


60 


.56 


.56 


.48 


2.22 


2 


.22 


.22 


.27 


.46 


3 


.32 


.32 


.34 


.94 


4 


.39 


.39 


.39 


1.30 


5 


.44 


.45 


.43 


1.56 


6 


.49 


.49 


.46 


1.76 


8 


.54 


.54 


.50 


2.02 


10 


.58 


.58 


.53 


2.20 


12 


.60 


.60 


.55 


2.31 


18 


.65 


.64 


.59 


2.54 


30 


.68 


.68 


.62 


2.71 


60 


.70 


.70 


.60 


2.85 



- 12 - 



MJJIIIJl IU,lfJlUJPLA..LU U .11 
■ > * *> « J ** ..- ,1.1^1, y."W 









10 


2 


10 


3 


10 


4 


10 


5 


10 


6 


10 


8 


10 


10 


10 


12 


10 


18 


10 


30 


10 


60 


12 


2 


12 


3 


12 


4 


12 


5 


12 


6 


12 


8 


12 


10 


12 


12 


12 


18 


12 


30 


12 


60 


18 


2 


18 


3 


18 


4 


18 


5 


18 


6 


18 


8 


18 


10 


18 


12 


18 


18 


18 


30 


18 


60 



OveraiU. 
& Dailal 
Power 



.38 

M 

.53 

.58 

.65 

.68 

.71 

.75 

.79 

.8X 



.30 

.¥f 

.5^ 

•61 

.67 

.73 

.77 

.79 

.83 

.86 

.88 



.39 

.59 

.71 

.79 

.84 

.89 

.92 

.94 

.96 

.97 

.98 



-r- - , K w ^ ^ ' ,." '' av\Y'a Y ~T. 1: V^vT'^i' 

' * - - *'’'VJv%'* C'/T'V. ^ -< ♦'•V5 n_' v-V--, - •' >’! 

‘ . '> ♦" ' v-‘ :.T.v* ' 

'■ -/-- ,•..■> ■”• ; - 

*■ 0 ' “'I- ‘ '* '’ *»' ■* ' ^ 'I ~.r " « '** * '<" ~ *****^ fj 






Tfeible 3 (Con-u’d) 



Batmik 

Approximation 



o26 

.38 

.47 

.54 

.58 

.65 

.68 

.71 

.75 

.79 

.81 



.30 

.44 

.54 

,62 

.67 

.73 

.77 

.80 

.84 

.86 

.89 



.40 

.59 

.72 

.79 

.84 

.89 

.92 

.94 

.96 

.97 

.98 




Cijjve 

Fitting 

Estimate 



.31 

.39 

.45 

.50 

.53 

.59 

.62 

.64 

.70 

.73 

.71 



.34 

.44 

.51 

.57 

.61 

.67 

.71 

•74 

.80 

.84 

.82 



A 
F/F 



QL 



.56 

1.16 

1.58 

1.90 

2.14 

2.46 

2.68 

2.83 

3.10 

3.31 

3.49 



.66 

1.36 

1.87 

2.25 

2.54 

2.91 
3.17 
3.34 
3.67 

3.91 

4.12 













The Patnaik approximation and Subroutine FDIST were used to 
obtain the pover estimates reported in column four of Table 3« In 
only 12 of the 99 potfer estimates based on the Batnaik approxinsation 
in Table 3 is there a difference between these ^^ues and the 
tabled values given by Overall and Dalai (7) as large as .01. Con- 
sidering that both the tabled values and these estimates have been 
rounded to the nearest hundredth there is for all practical purposes 
no difference between the estimates and the tabled values in (7)« 



It should be noted that the value of v which was calculated 
by formula 2k was rounded to the nearest integer before evaluating 
the integral of the central F distribution. Presumably, the 
accuracy of the power estimates would be slightly improved by 
iising fractional values of v, however, in view of the accuracy 
obtained for the example in Table 3 this may be unnecessary for 
practical purposes. 



In an attempt to determine an easily manipulated fimctlon 
relating power to degrees of freedom and noncentrality parameter, 
the least-squares method was used to fit power values to functions 
of the parameters. Primary attention was devoted to the power 
function for = 2 . A total of 99 power values were used: the 

88 values tabled by Overall and Dalai (7) for X = 3, 4, 5, 6, 8, 



10, 12, and 18 and for Vg = 2, 3^ k, 6, 8, 10, 12, I8, 30, and 
60 ; and 11 values of .05 for which X = 0 where v was the 



same as the tabled values. 



For the curve fitting, 
and X: 



<f> and n were substituted for 



(25) 



n 



2 



Vi + l 



+ 1 



( 26 ) 



<t> = X/n . 



Then various functions of n and were used in the least- 
squares equations: powers of the parameters ranging from 1/2 

to 3, cross-products, and natural logarithms. 



The simplest equation with the fewest terms which resulted 
in the hipest correlation with the tabled values was 



(27) Power -10.57 - 1.15n - 8.5H^ + 5.43n<l>^ + 16.23 log 



[n(01)] . 



-14- 



P"' 



11 

II 



I 



If 



II 



m 






II 



i 



•its 



m 



li 






Iw 

mi 



fed 



i 



II 



— 





















o 

ERIC 

- 



The above equation resulted in power estimates that had a 
correlation of .9812 with the tabled values of Overall aM ^^1 
that are presented in Table 3. These estimates are reported in 
column five of Table 3. In addition the same equation provided a 

_ . .. ^ J ~ 



reasonable fit to the OveraU and Dalai power values for 



(r = .959) and 



Vi = l 



gind 2(r = . 981 ) • 



Using this equation for the values for 



vi = a, 



the largest 



discrepancies between predicted and true values occurred for ^rge 
values of n and 4>2 where the estimated power was greater than 
one If a value of 1.0 is substituted for the estimated power 
valies larger than 1.0, the largest discrepancy between predicted 
and true is .IO 6 . This degree of accuracy mi^t be sufficient for 
some purposes. The accuracy evaluated by the correlation is grater 
than that of Overall and Dalai's within the ra^e studied, 

and the scale is the same as power. However, the computation of 
the function is far more difficult than F / F , althou^ ma^ 
values of the fianction can be quickly computed by a very simple 
computer program. 



Curve fitting as an approach to the power function should not 
be abandoned. The power functions are not complex curves and there 
is every reason to believe that a reasonable function can be ob- 
tained. Minimizing the squares of the residuals is perhaps not 
the most appropriate criterion; other criteria should be considered. 
In addition, future work should use more power values for large 
n and so that the asymptote of the power function has better 

representatipn . 



Cost of an Experiment 



It is obvious that an experimenter can always increase power 
by increasing K and/or n . However, in any practical situation, 
the experimenter has only limited resources -at his comm^d and 
would like to be able to design the experiment so that the power 
is maximized within the constraints Imposed by the ava^ble re- 
sources. Generally, the experimenter cannot increase both n 
and K : if K is increased n must be decreased. 



Let C denote the total cost per group of the experiment and 
assume that this cost is the same for all groups. Follow^ 
lead of Cronbach and Gleser (l), it is useful to assume that the 
cost is the same for all subjects and that this cost per subjec 



consists of a fixed cost, , >mioh Is Independent of test 



length and a cost per test unit, 
is then given by: 



The cost per group, C 



( 28 ) 



C = n (Cq + KC^) 



\m 









i 

w 







- 15 - 






where n is the number of peojxLe per group and K is the length 
of the test. Factors which contribute to the fixed cost, , 

might be length of time required to give instructions and cost of 
bringing the subject to the testing center. The variable cost, 

, would be dependent upon factors such as the per-item scoring 

costs and costs of subject time. There is no real provision in 
this model for test development costs which would be a function 
only of K , the test length. This cost model implies that for a 
constant cost per cell a change in test length, from K to K* 
must be accompanied by a change in the number of subjects per cell 
from n to n* where 



( 29 ) 



n* 



n (C + KC-) 
^ o 1^ 

C + K*C. 
o 1 



For any given n 
allowable K , 



(30) 



one can solve formula 24 for the maximum 
C - n C 




In the special but rather unrealistic case where is 

equal to zero, the most efficient allocation of resources will 
always be achieved by setting K equal to one regardless of the 
test reliability. This conclusion was drawn by Overall and Dalai 
(7) . This can be seen by noting that for - 0 , the cost per 

cell, C , is a constajit as long as the product nK is a constant, 
and for a fixed product, nK , not only is the noncentrality 
parameter maximized for K— 1 but so are the degrees of freedom. 

Allocation of Resources 



To provide the researcher with a method of evaluating the 
relative effectiveness of increasing sample size and increasing 
test length, two computer programs were written in FORTRAN IV. 
Listings of the programs are presented in Appendix A. These 
programs handle only the limited case of a single factor analysis 
of variance. 






P 



3? 






Each program leads six parameters; 

1 PHITRU — the ratio of the variance of the effects to 
the variance within. This parameter has been denoted 
above as , 

2 COST — the total allowable cost per group, denoted C 
above. 



-16- 
















3 CZERO — the fixed cost per test, denoted C above. 

» o' 

4 CONE — the variable cost, that is, the cost per test 
unit, denoted , 

5 REL — the reliability of the unit-length test denoted 
Pi ' 

^ VI — the degrees of freedom for the numerator of the F 
ratio (number of groups minus one), denoted . 

lach of the two programs then computes the number 

of persons per cell permissible within the cost constraints* Fpr 
each sample size from two to the maximum, the corresponding maxi- 
mum K is calculated. The is estimated by using formula 18. 

Both programs then estimate the power for each of the permissible 
ccxobinations of n and K . Aill of the power estimates are 
printed to permit the identification of the combination of n 
and K which yields the power. 

The first program, ‘'Allocation of resources based on least- 
squares fit of power function," uses the approximation given by 
equation Zf . This is extremely rapid and can compute power esti- 
mates for many combinations in a few seconds. The output of this 
program consists of the input parameters eind K , n , Vp , 

power. Sample computer printouts can be seen in 

Appendix B. 

i 

The second program, ‘‘Allocation of resources using the 
Patnaik approximation," is based upon the noncentral P ap- 
proximation developed by Eatnaik (8) and presented in equations 
22, 23, and 2k above. A siibroutine PDIST, written by Holloway 
and Capp and revised by McKelvey (See Appendix A) was used to 
obtain both the critical P value ( F ) and to evaluate the 
in'fcegrai of the central P distribution employed in Eatnaik *s 
approximation of the noncentral. P distribution. Sample output 
from this program can also be seen in Appendix B. In addition to 
the output of the first program, the values of v in equation 
24 and Pqj for each permissible combination of n 8ind K are 
printed. This second program takes significantly more time than 
the first program: each of the estimated power values requires 

about foxa? seconds to compute on the IBM JOkk, 

A comparison of the two methods of estimating power is pro- 
vided in !^ble 3» Table 3 also Includes th^values of power given 
by Overall, and Dalai (7) and the values of P / P^^ , the power ap- 
proximation suggested by Overall and Dalai (7). As noted before, 
the scale for P / P is not the same as scale for power. If a 



correlation is used as the measure of the goodness of the api^^i- 
mation, the methods of power can he ordered: F / , r = 

least-squares, r = .98I; and Patnaik, r = ,9999 . Considering 
the size of the discrepancies between the est' «3ated and true values, 
the Batnaik approximation is clearly superior to the least squares. 



ifeble 4 presents the power estimates computed by the two pro- 
grams under three different cost conditions. The total cost, C , 

“in all cases. Under the first condi- 



is 3000 .10 and 



V - 2 
1 



tion C_ = 0 , and 



Cl = 100 



Under these conditions, power is 



maximized by increasing sample size to the mximum allowable given 
the cost constraints, which for all cases represented in Ofeble 4 
is 30 c The estimates based on the Batnaik approximation accurately 
reflect this fact. On the other hand, the least-squares estimtes 
erroneously decrease for the largest values of n , but the errors 
in the estimated power are not large. It is interesting to note 
that the maximum power in this first, cost condition is much lower 
than in the other two cases: the large cost per test unit (C^ ~ 100) 

does not permit the ^ise of a very reliable instnjment. 



In the second and third cost conditions, the maximum power is 
achieved with a smaller sample size than in the first condition. 

In these two cases the differences between the allocations based 
on the two approximations are minimal. However, the differences 
in the power estimates are not necessarily trivial. 



Conclusions 



Of the three approxiuettions to the power function that were 
investi^ted, the one lased on the Batnaik approximation and using 
the FKCST program to compute integrals of central F distributions 
was by far the most accurate procedure. The only disadvantage of 
this method is that it requires considerably more computational 
time than the other two estimations methods considered. 



The least-squares approximation to the power function which 
was developed has the advantage of great computational speed. 
However, the method has two major disadvantages in its present ’ 
state of develojanent; the approximation is limited to the case 
of two degrees of freedom in the numerator, and the power esti- 
mates are not sufficiently accurate for many purposes. In view 
of the computational ease of this approach, it is considered to 
be a potentially useful line of future research. If sufficiently 
accurate estimates could be obtained with a relatively simple 
function, a major advantage of this apprcKtch would be that the 
function could be dealt with analytically more readily than the 
integrals of the noncentral F distribution. 



-18- 



iERlC 







Table 4 



Estimated Pcfwer for C = 3000 

pj^= .10, = 2 



Least-Squares 

Estima~ces 

C 0 80 90 

o 



Estimates based on 
Patnaik Approximation 



0 80 90 



n 


^1 100 


20 


2 


.15 


.35 


3 


.22 


.52 


if 


.25 


.61 


5 


.28 


.67 


6 


.30 


.70 


7 


.31 


.72 


8 


.32 


.73 


9 


.33 




10 


.33 


.72 


11 


o33 


.71 


12 


.3^1- 


.70 


13 

lif 




.69 

.67 


15 




.65 


l6 


.33 


.63 


IT 


.33 


.61 


l8 


o33 


.59 


19 


.33 


.56 


20 


.32 


.5^1- 


21 


.32 


.52 


22 


.31 


.h9 


23 


.31 


.k6 


2if 


.31 


.if if 


25 


.30 


M 


26 


.30 


.38 


27 


.29 


.36 


28 


.29 


.33 


29 


.28 


.30 


30 


.27 


.27 



10 


100 


20 


10 


.^3 


.lif 


.33 


.k3 


.69 


.19 


.55 


.7^ 


.86 


.22 


.67 


.87 


.97 


c2if 


.73 


.92 


IcOif 


.25 


.76 


.9^ 


1.09 


.26 


.77 


.95 


1.11 


.27 


.78 


.96 


1.12 


.27 




.96 


1.12 


.28 


M 




loll 


.28 




2 | 


1.09 


.28 


.75 




lo06 


.28 


.7^ 


•96 


l.Oif 


.29 


.73 


.9i^ 


1.00 


.29 


.72 


.9^ 


.97 


.29 


.71 


.93 


.93 


.29 


.69 


.91 


.89 


.29 


.65 


o90 


.8if 


.29 


.63 


.89 


.80 


.29 


.61 


.85 


.75 


.30 


.59 


.83 


.70 


.30 


.57 


.78 


.65 


.30 


.52 


.75 


.60 


.30 


.50 


.72 


.5^ 


.30 


M 




.^9 


.30 


.ifif 


.60 


.4if 


.30 


M 


.52 


.38 


.30 


.36 


.46 


.33 


.30 


.33 


.38 


.27 




.30 


.30 



®'The maximum value (based on three decimal places) in each 
column is ^mderlined. 





The Overall and Dalai (7) method of estimating power, F / F , is 
computationally most simple and is the only one of the three methods 
that is well suited to hand calculations. As previously noted, this 
approach does not yield the same scale as power, and the estimates 
are much less accurate than those based on the Patnaik approximation. 

It would be feasible, of course, to write a ccmiputer program -vdiich 
uses a subroutine such as FDIST to compute F ^ , and then compute 
and rescale F / F as a means of obtaining power estimates. Such 
a i^'i’ogram presumably would have about twice the speed of the Patnaik 
approximation program since it would involve only half as many 
integral evaluations, however, the accuracy of these estimates would 
not match the accuracy of the Patnaik approximation. 

Computer programs were written to determine the most efficient 
allocation of resources. The two programs are based on the Patnaik 
and least-squares approximations, and the Patnaik approximation is 
distinctly superior to the least-squares approximation. It is clear 
from the sample problems presented that differences in the relative 
magnitude of fixed and variable cost result in different optimum 
allocation of resources to test length and sample size. The results 
are in agreement with the conclusion of Overall and Dalai (7) that 
the maximum power under conditions of zero fixed cost is always 
obtained by increasing ttie sample size to •Uie maximum permissible. 
Under the more realistic condition of nonzero fixed cost, however, 
the maximum power is generally obtained with less than maximum 
pemissible sample and corresponding test length which is greater 
than the minimum unit length test. 




Part II: Empirical Demonstration 



Purpose 



I 



The preceding theoretical development has been based upon the 



assumptions of classical test theory. Because the assumptions of 
classical test theory cannot be expected to hold exactly in real 
data, the effects on the power of statistical tests of changing sam- 
ple size and test length were demonstrated empirically. 



Subjects: The subjects were 4885 eleventh-grade students who 

had participated in "A Study of Academic Prediction and Growth” a 
nationwide study sponsored by the Educational Testing Service. 

The subjects were divided into groups to permit the study of 
group comparisons. A two-group division was provided by sex: 2^3 

males and 2362 females. A three-group division was arbitrarily 
made by dividing subjects into three groups of approximately equal 
size on the basis of the mean scores of students in different types 
of schools. (Schools were divided into nine types for the original 
study.) The type of school, itself, would have provided a more 
interesting set of groups for study, but the differences in mean 
scores were too small. The sizes of the three groups were: lew— 

scoring, 2105; middle-scoring, 1276; high*=scoring; 1489 • The 
totals for the two-group and three-group divisions are not the 
same: subjects with a missing or inappropriate group designation 
for sex or type of school were eliminated from that analysis. 

Measures: In 196I, the subjects responded to I90 verbal 

type items of the School and College Ability Test (SCAT) and the 
Sequential Tests of Educational Progress (STEP). These items 
measure verbal aptitude, reading achievement, and writing achieve- 
ment. These items were considered to belong to a single item 
pool. 

Procedure: The I90 items were scored to provide each sub- 

ject with a "true" score. All of the subjects in each of the 
groups defined above were considered to form a population of 
interest. The distributions of the true scores were then analyzed 
for these populations. 

To shew the effect of the error of measurement on the distri- 
bution of the test statistic, items and persons were sampled from 
the populations according to the scheme presented in Figure 1. 



Method 



- 21 - 




mitmm 




s 



FIGURE 1 



Sampling Matrix: 

(Number of Samples Drawn for Each Test Length and Sample Size) 



Items 



10 



20 



40 



Persons 

Per 

Group 



200 


200 


200 


100 


100 


100 



Tests were created by randomly sampling 10, 20, or 40 items 
from the total of 190. Samples of persons were created by randomly- 
drawing four or eight persons from each group. For each sample 
of persons, -the items that comprised a single randomly generated 
test were scored. For each of the designs involving four persons 
per group, a to-tal of 200 samples were drawn and for each of the 
designs involving ei^t persons per group, a to-tal of 100 were drawnt 
After -the randomly generated tests were scored analyses of -variance 
were performed and the distributions of the F s-tatistics were 
plotted. 

Results 

Population parameters for each group are presented in Table 5* 



TABLE 5 



Group True-Score Parameters 










Mean 


Standard 


Skeuness 


Kurtosis 








Deviation 








Male 


105.2 


33.0 


-.335 


-.759 


Sex 














Female 


112.6 


31.2 


1 

• 

o 

ON 


-.933 




Lew- 


99.^ 


31.0 


.125 


-.764 


Scores 


Middle 


108.4 


32.2 


-.100 


-.880 




Hi^ 


122.2 


30.5 


-.484 


-.484 



Within each set, the two-group and the three-group, the means are 
different, -the s-tandard deviations comparable, and the measures of 



- 22 - 












i 



1^1 









SI 






M. 






s 



O 

ERIC 


















skewness and kurtosis appropriate for the assiimption of a normal 
population distribution. The measures reported for skewness 
kurtosis are: 



(31) 



Skewness 



^ (T T)^ 

2 

Na^ 



(32) 



and 



Kurtosis = 



E (T - T)^ 



-3 



where: T is the score on all I90 items for a given subject, 

T is the group meem, and 

is the group standard deviation. 

For a normal population these measure of skewness and kurtosis should 
be zero. 

The results of the analyses of variance are presented in 
Table 6. 



TABLE 6 

Population True-Score Analyses of Variance 



Two-Group Analvsis 








Source 


df 


SS 


MS 


F 


Between 

Within 

Total 


1 

4.653 

4.654 


64,025 

4,799,809 

4,863,834 


64,025 

1,031 


62.07 


Three-Group Analysis 








Between 

Within 

Total 


2 

4,867 

4,869 


453,338 

4,735,485 

5,188,823 


266,691 

972 


232.96 



For the two-group analysis (sex) an F ratio of 62.07 with one and 
4653 degrees of freedom was obtained. Although in a sample this 
would obviously be a highly significant F value, the value of 
is only .0133. Thus the values of are only .0532 for the ^ 

designs with four persons per group and .1064 for the designs with 
eight persons per group. 



-23- 




The F-ratio for the three-group analysis of variance was 232.96 
with two and 4867 degrees of freedom. This corresponds to a value 

of equal to .0957* ^e values of are .3828 and .7656 for 

designs with four and eight persons per group respectively. 

The distributions of the observed F ratios for the two-group 
analyses are presented in Figure 2. For two groups and four persons 
per group there are one and six degrees of freedom and, for a = .05 
the critical F value is 5*99* With two groups and eight persons 
per group there are one and l4 degrees of freedom and the critical 
F value for a= .05 is 4.60. 

The six distributions shown in Figure 2 do not differ markedly 
from each other. All six distributions are "J" shaped. For the 
four-person design there is a steady decrease in the number of F 
ratios at the low end of the scale as the number of items is increased. 
In the distributions for ei^t people the decrease in low values 
of observed F ratios appears when the number of items Increases 
from 10 to 20 but for 40 items the unusually large number of cases 
in the lowest interval destroys this trend. 

In Figure 3 the comparable distributions of observed F 
ratios for the three-group analyses are 4.26 and 3*47 for designs 
with four and ei^t persons per group respectively. The degrees 
of freedom for these analyses are two and nine for four persons 
per group and two and 21 for ei^t persons per group. 

The distributions in Figure 3 a-re much less "J" shaped, 
more nearly symmetrical, than their counterparts in Figure 2. 

The distributions do not change systematically in the four-person 
designs as the number of items is increased. In the ei^t— person 
designs there is some tendency toward larger F ratios as the 
number of items is Increased. The most noticeable difference is 
between the four- and el^t-person designs: larger F ratios are 

observed in the eight- person designs. 

In !Rible 7, the proportion of observed F ratios that exceed 
the critical value (a = .05) for each experimental design are 
reported. In all but one of the 12 experimental designs the "ob- 
served power" is greater than .05. The observed power is greater 
in the three-group design than in the two-group design. Within 
each design there is generally greater observed power for the ei^t- 
than for the four-person designs and observed power tends to in- 
crease as the number of items is increased. 



-24- 










m 












0*5 dBAQ 






o 



o 

lf\ 



o 



o 

<T\ 



o 

CNJ 



0 *$ «A 0 



^ ^ 

s ^ 



ir\ 

a 









5 



O 

Ot 



Q* 

O 

(£f 





e9«^U90J9j 



- 25 - 




o 

>o 



- 4 ^ 

U 



lA 

o 






o 

CM 



o 

/P 

Q-i 



<30 



O 

lA 



O 

-4 



O 

<A 



O 

CM 



O 

H 



0*^ asAO 




dSisq^udoadd 






w. 






p 4 














4 People (P.05 ■* 4.26) 




: to 

o 




9?6)treoaoj 



- 26 - 










• .-i 



0*6 ae/io- 






s 



o 

• •• 

u\ 



o 

« 



o 

- ♦' 



o 

♦ 

CM 



O 
• ’ 



0'$ 



5 



« o 

' • 



cn 









o 

CM 



5. 

o 

JP 

Cm 




to 



o 

• • 

in 



O 

cr\ 



O 

CM 



O 

♦ ' 



0*5 



o 



O 

9 



Q 



O 

cn 



O 

CM 



O 

CM 



i/\ 

H 



I 

Q 



f 

lA 



^BViBoaej 



( 



TABLE 7 
Observed Power 



Two-Group Analyses 


10 


Items 


40 


4 

Persons 


.055 


.085 


.085 


8 

Three-Group Analyses 


.080 


.040 


.080 


4 

Persons 


.080 


.105 


.130 


8 


.150 


.160 


0 

IQ 

• 



Discussion 



The empirical distributions of the P ratios presented in 
Figures 2 and 3 do not contain enough data points to provide very- 
smooth or very stable results. It is clear, however, that -the 
probability of detecting a population true score difference by the 
methods used is not great. 

For the -tv^o-group analyses, the population true score non- 
centrality parameters, » are only .0532 and .1064 for the 

four- and eight-person analyses respectively. The -values of X 

2 ^ 

are even smaller. The -variance of the group effects, , is in 

each case 13.75. Relative to the wi-thin group variance of 1031, 
this variance is very small and in view of the X^ -values one 

would not expect -the power to be much greater than .05 and it was 
not. 



For the -fchj'ee-group analyses, the population true score non- 
centrality parameters, X^ , are .3828 and .7856 for the four- 

and ei^t-person analyses respectively. While not large , these 
■values are on -the order of eight times as large as the corresponding 
X^ -values in the -two-group analyses and -the degrees of freedom for 

both numerator and denominator are larger for -the three-group analyses 



a 

i 

5 

1 

I 



-27- 




than they are for the two-group analyses. The variance of the group 
2 

effects, , is in each case, 93.01. The theoretical power for 

the three-group analyses would he greater than for the two-group 
analyses, hut it would still he less than .10. The observed power 
which is reported in Table 7 was greater for the three-group analyses 
than for the two-group analyses. 

Althou^ the true score population differences were small in 
both examples, the differences could be of psychological or educa- 
tional importance. But it is clear that differences of this magni- 
tude are not likely to be detected with samples of the size used 
in these examples. 



Conclusions 



The empirical demonstration of the effect of error of measure- 
ment of the power of statistical tests was limited by the small 
population true score differences among the groups. The expected 
trends were not clearly demonstrated but some indication of in- 
creasing power with increasing reliability of the instrument and 
with increasing sample size were observed. The effect of increasing 
n appeared to be relatively greater than the effect of increasing 
K which is in agreement with theoretical expectations. 

The most striking feature of the demonstration, however, was 
that the population mean differences which are reported in Table it- 
and which appear to reflect the magnitude of differences in which 
the educational researcher is often interested have little chance 
of being detected with the four- or eight-person designs studied 
here. 



Summary 



The purpose of this research was to study the effect of error 
of measurement upon the power of statistical tests. Attention was 
focused on the F test of the single factor analysis of variance. 
Formulas were derived to show the relationship between the non- 
centrality parameters for analyses using true scores and those 
using observed scores. The effect of the reliability of the 
measurement and the sample size were thus demonstrated. The as- 
sumptions of classical test theoiy were used to develop* formulas 
relating test length to the noncentrality parameters. 





Three methods of estimating pcn^er for different conditions of 
sample size and test length were studied. The three methods were:^ 
^ / F(jj suggested by Oveiall and Dalai (7)^ least^squares approxi- 
mation, and an approximation based on the work of Batnaik (8)o 
The approximation based on Batnaik' s work was significantly more 
accurate than the other two methods but required more computational 
time. 



The cost of an experiment was analyzed in terms of a fixed 
cost per subject and a variable cost dependent upon test length. 
Computer programs were written to use the least- squares approxi- 
mation and the approximation based on Batnaik to estimate the 
power under all permissible allocations of resources to sample 
size and test length. The program results indicate which of the 
permissible allocations will result in maximum power. 

To demonstrate empirically the effect of error of measure- 
ment on the power of statistical tests , samples of persons and 
items were randomly drawn from a large pool of data. Tests of 
10 20, and 40 randomly drawn items were scored for samples with 

four- and ei^t-persons per group. The expected trends were 
present but not definitive. 



ERIC 






- 29 - 













References 



1, Cron'bach, L, J.; and Gleser, Goldine C* Psycholojscical Tests 
and Personnel Decisions . (2nd ed«) Ur'bana, Illinois: 
University of Illinois Press „ 1965- 

2« Feldt, L, S,, and Mahmoud, M, W. "Power Function Charts for 
Specification of Sample Size in Analysis of Variance," 
Psychometrika . XXIII, September 1958, p, 201-210. 

3. Fox, M. "Charts for the Power of the F-Test," Annals of 

Mathematical Statistics . XXVII, June 1956.**~p„ 48^^497. 

4. Gulliksen, H. Theory of Mental Tests . New York: John Nlley 

& Sons, Inc. 1950. 

5. Lehmer, Emma. "Inverse Tables of Probabilities of Errors of 

the Second Kind." Annals of Mathematical Slatistlcs . 

XV, December 1944. p. 388-398. * — 

6. Lord, F. M. Item Sampling in Test Theoiy and in Researcli 

Design. Research Bulletin ^5-22, Princeton, N. J., 
Educational Testing Service, 1965. 

7* Overall, J. E., and Dalai, S. N. "Design of Experiments to 

Maximize Power Relative to Cost," Psychological Bul3.etin. 
IXIV, November 1965. p. 339-350. 

8. Patnaik, P. B. "The Non-Central X^- and F-Distributions and 
Their Applications," Biometrika . XXXVI, June 1949. 

p. 202-232. 

9* Pearson, E. S., and Hartley, H. 0. "Charts of the Power 

Function for Analysis of Variance Tests, Iterived from 
the Non-Central F-Distribution, " Biometrika . XXXVIII, 

June 1951. p. 112-130. 

10. Sutcliffe, J. P. "Error of Measurement and the Sensitivity of 

a Test of Significance," Psychometrika . XXIII, March 1958. 

p. 9-17. 

11. Tang, P. C. "The Power Function of the Analysis of Variance 

Test," Statistical Research Memoirs . II, 1938. p. 126-149. 

12. Wishait, J. "A Note on the Distribution of the Correlation 

Ratio," Biometrika . XXIV, November 1932. p. 441-456. 



-30- 




APFI3TOIX A 




FOEIPRAN Program Lists 



Program page 

Program using least-squares 

approach A-1 

Program based on Patnaik 

approximation A-2 

Subroutine FDiST A-3 




c 

c 



ALLOCATION OF RESOURCES BASED ON LEAST SQUARES FIT OF POWER 
FUNCTION 



CONEtREL 

F5.0) 



VI 



1 CONTINUE 

READ 5* PHITRUv COST» CZERO» 

5 FORMAT( F5.1* 3F5.0» F5.2* 

IF(PHITRU) 99f9^>88 

88 CONTINUE 

PRINT 6» PHITRU* COST» CZEROt CONE tREL 

6 FORMATdHl, F5.1, 3F7.0» F5.2» F5#0) 

PRINT 7f 

7 FORMAT (99Ho K N NUl 

ILAMDA POWER 

NMAX = COST/ I CZERO + CONE1 
DO 20 N« 2 ♦NMAX 
PEOP = N 

XIC = (COST - PEOP*CZERO)/(PEOP*CONE) 

PHIOB « (XK ♦ REL*PHITRO)/ (REL*XK + 

POWER * -10.57 -1.15*PEOP -8.54*PHI0B 
1 16.23i^ALOG(PEOP*(PHIOB+l.on 
POWER « POWER /I 60. 

XLAM * P£OP*PHIOS 
V2 = 3*( N-1) 

IVl « VI 
IV2 * V2 

PRINT 10» XKtN» IVlf IV2f PHIOBf XLAM ♦POWER 

10 FORMAT (Fll.3t 3Il0f AF10.3) 

PUNCH 11 ♦XK»NtIVldV2fPHIOB,XLAM^POWER 

11 FORMAT (F8.3t3l6t4F8.3) 

20 CONTINUE 

GO TO 1 
99 CONTINUE 
END 



VI 



NU2 



OB 

) 



PHI**2 OBS 



( 1 .0--REL ) * ( PHI TRU+1 .OH 
+5.43»PHIOB*PEOP + 






--A1- 



1 %' 






allocation of resources using the patnaik approximation 







czero» 

F5b2» 



cone.rel 

F5*0) 



♦VI 



.rel 



NUl 



NU2 



) 



CONTINUE 

READ 5f PHITRU* COST# 

F0RMAT( F5*1» 3F5*0* 

IF(PHITRU) 99»99#88 
CONTINUE 

PRINT 6f PHITRU# COST# CZERO# CONE 
FORMATdHl# F5*l# 3F7*0* F5i»2) 

PRINT 7# 

FORMAT (90H0 K N 

1PHI»»2 OBS LAMOA FALPHA POWER 

NMAX » COST/( CZERO + CONE) 

DO 20 N= 2, NMAX 
PEOP a N 

XK * (COST - PEOP»CZ£RO)/(PEOP*CONE) 

PHIOB ® <XK ♦ REL*PHITRU>/ (REL^XK + ( 1.0-REU*(PHITRU+X^0H 
XL AM « PEOP»PHIOB 
V2 = 3»( N-1) 

PHI * SORT (PHIQB) 

IV2 » V2 
FALPHA a 0. 

CALL FDIST{2#lV2#FALPHA».95) 

6ALPHA a FALPHA 
PHIOB a PEOP*PHIOB 
SCALE a (2#+PHIOB)/ 2. 

FALPHA a FALPHA/ SCALE 

V# ({ 2* + PHI6B)**2)/ (2.+2. *PNI0B) 

V+ 
s V 

0* 

FDIST 
a loO 
r VI 
V2 

PHIOB 

a PHIOB/PEOP 

PRINT 70» XKfN# IV1» IV2» V# PHIOB# XYLAM# GALPHA# PROS 
FORMAT (F1 o*3*3I10#5F10.3) 

PUNCH 71#XK#N»IVl#lV2#V#PHIOB#XyLAM#GALPHA#PROB 
FORMAT (F8.3#3I6#5F8*3) 

CONTINUE 
GO TO 1 
CONTINUE 
END 



NU OBS 



i 



V * 

IV ® 
PROB 
CALL 
PROB 
IVl 
^V2 
XYLAM 
PHIOB 



(IV#IV2# 
- PROB 



FALPHA# PROB) 



-A2- 




























/ 



subroutine FDIST (MM>NN»FX*PR0BX) 




CLARK HOLLOWAY AND W#B.CAPPt AUGUST 3l»1959 
REVISED APRIL 1»1961 R.J.MCKELVEY 
DIMENSION B(2) 

NOUT*6 
SF a 0*0 
SPROB *0.0 
F*FX 

PROBaPROBX 

M=MH 



N=NN 

IF(F) 76»l06»100 
SF«’ F 

IF (F-1.0) 101» lOlflOS 

XM=M 

XN« N 

LOW = 1 

FLO = F 

PLO = 0.0 

DELTA*FLO/500e0 

GO TO 21 

FL0=1«0/F 

XM= N 

XN» M 

LOW * 0 

60 TO 102 

SPROB a PPOB 

IFIPR0B)76*76»107 

IF (PROB- 0.5) 108»108*110 

XMaM' • 

XN«N 

LOW= 1 

PLO a PROB 

FLO a 0.0 

DELTA=PLO/2QO.O 

GO TO 21 

IF(PR0B-1.)U1»76»76 

XMaM ' • • 

XNaM 
LOW a 0 

PLO a 1.0 - PROB 
GO TO 109 
FACTLaO.O 
FACTal.o 

B(i)a(XM-2,0)/2.0 

B(2)a(XN-2,0)/2,0 

Aa(XM+XN-2,0)/2.0 



-A3- 











241 I F ( A-0 • 2 ) 400 » 76 » 242 
400 FACT = 0.31830989 

GO TO 283 

242 IF(A-0.7)410*76»243 
410 FACT=0.5 

GO TO 283 

243 00 245 I»1.2 
IF(B(I)-0.7)261»76»245 

261 IF(B(I)-0.2)264»76»262 

262 FACT=FACT/0. 886226925 

263 B(I)=1.0 
GO TO 245 

264 IF(BU)+0.2)265»76*263 

265 FACt^FACT/l. 772453850 
60 TO 263 

245 CONTINUE 

244 IF(A-0.t)281»76.251 

251 FACT«FACT*A/(B(1)*B(2)) 

IF ( FACT-99999999 .) 8 30 >283 » 283 
"830 IF(FACT-l.bE-8)283»283»26 
26 A»A-1.0 

B(l)iB(l)-1.0 
B(2)-B{2)-1.0 
60 TO 243 

281 IF(A-0.2)283»76»282 

282 FACT«FACT*Q .886226925 

283 FACTL=FACTL+AL06 ( FACT ) 

FACT=1.0 

IF(A-0.7)284*76»26 

284 Y 1=FACTL+ ( XM/2 • 0 ) ♦AL06 ( XM/XN ) 

Y2*(XM-2.0)/2.0 
Y3*(XM+XN)/2.0 

36 FsDELTA/2.0 
CUM=0i0 

C 

37 HFDL=Y1+Y2*AL0G ( F ) -Y 3*AL06 ( 1 • 0+XM*F/XN )+At06 ( DELTA) 
IF (HFOL-J-20. ) 50»51 »51 

50 HFD=0.0 
60 TO 52 

51 HFOsEXP (HFDL) 

52 CUM-CUM+HFD 
F=F+DELTA 

375 IF ( F-FLO ) 37 ,37. 38 

38 IFtPL0)76»39»381 

381 IF(HF0)76»384»382 

382 IF(ALOG{PLO)-HFDL-4.604)383»384»384 

383 DELTA»0ELTA/2.0 
GO TO 36 




■A4- 








384 


I F(CUM-PLO) 37*39*39 




39 


flo-f-delta 






IF(SF) 76*43*40 




40 


F « SF 






IF(LOW) 76*42*41 


s ^ ^ 


41 


PROB » CUM 
60 TO 49 




42 


PROB a l.O- CUM 


c 




GO TO 49 




43 


PROB a SPROB 






IF(LOW) 76*45*44 


V-/ 


44 


F a FLO 
GO TO 49 




45 


F a 1#0/FLO 


o 


49 


PROBXaPROB 

FX»F 




1000 


RETURN 




76 


WRITE (6*176) 




176 


FORMAT (10X*36HCOULI 






1(I6*1H*I6*1H*E131.6 


■v^' 




GO TO 1000 



r'' 



/ *. 



c 



L,) 












o 



w 



o 



o 



o 









mm»nn>fx*probx 

■ WORK F DISTRIBUTION WITH 



END 



-A5- 










aiiiilteiittliyi 







APPENDIX B 



Sample Program Output 



Source 



.page 



Program using least- 
squares approach 



Program Based on Patnaik 
approximation 



B-4 



A 










100 3000 


0 


100 


10 


? 






K 


N 


NUl 


NU2 


OBpHI»*2 


OB LAMDA 


POWER 


13*000 


2 


2 


3 


1*316 


2*632 


0*151 


10*000 


3. 


2 


6 


0*917 


2*752 


0*215 


7*500 


4 


2 


9 


0*704 


2.817 


0*253 


6*000 


5 


2 


12 


0*571 


2*857 


0*278 


5*000 


6 


2 


15 


0*481 


2*885 


0*295 


4*286 


7 


2 


18 


0*415 


2*905 


0*308 


3*750 


8 


2 


21 


0*365 


2*920 


0*318 


3*333 


9 


2 


24 


0*326 


2.932 


0*325 


3*000 


10 


2 


27 


0*294 


2*941 


0*329 


2*727 


11 


2 


30 


0*268 


2*949 


0*333 


2*500 


12 


2 


33 


0*246 


2*956 


0*335 


2*308 


13 


2 


36 


0 * 228 


2*961 


0*336 


2*143 


14 


2 


39 


0*212 


2.966 


0.336 


2*006 


15 


2 


42 


0*198 


2.970 


0*335. 


1*875 


16 


2 


45 


0*186 


2.974 


0*334 


1*765 


17 


2 


48 


0*175 


2.977 


0*332 


1*667 


18 


2 


51 


0 * 166 


2.980 


0*329 


1*579 


19 


2 


54 


0*157 


2.983 


0*326 


1*500 


20 


2 


57 


0*149 


2.985 


0*322 


1^429 


21 


2 


60 


0*142 


2.987 


0*319 


1*364 


22 


2 


63 


0*136 


2.989 


0*314 


1*304 


23 


2 


66 


0*130 


2.991 


0*310 


1*250 


24 


2 


69 


0*125 


2.993 


0*305 


1 * 200 


25 


2 


72 


0*120 


2.994 


04300 


1*154 


26 


2 


75 


0*115 


2.995 


0*295 


1*111 


27 


2 


78 


0*111 


2.997 


0*289 


1*071 


28 


2 


81 


0*107 


2.998 


0*283 


1*034 


29 


2 


84 


0*103 


2.999 


0*277 


1*000 


30 


2 


87 


0*100 


3.000 


0.271 



-Bl- 









100 3000 



80 



2 



20 10 





1C 


N 


NUl 


NU2 


OBPHI**2 


OB LAMOA 


POWER 




71,000 


2 


2 


3 


4,176 


8,353 


0,348 


i 


46.000 


3 


2 


6 


3.172 


9,517 


0.516 




33. 500 


4 


2 


9 


2.528 


10.113 


0.611 


^ '' 


26.000 


5 


2 


12 


2.080 


10.400 


0.668 


4 


21,000 


6 


2 


15 


1.750 


10.500 


0.701 




17,429 


7 


2 


18 


1.497 


10.479 


0.719 


^ s 


14,750 


8 


2 


21 


1.297 


10.374 


0.727 


\ / 


12,667 


9 


2 


24 


1.134 


10.209 


0,728 




11,000 


10 


2 


27 


1.000 


10.000 


0,723 




9,636 


11 


2 


30 


0.887 


9.757 


0.714 


O' 


8,500 


12 


2 


33 


0.791 


9,488 


0o702 




7,538 


13 


2 


36 


0.708 


9.199 


0.687 




6,714 


14 


2 


39 


0.635 


8.892 


0.670 


( ^ 


6.000 


15 


2 


42 


0.571 


8.571 


0.651 




5.375 


16 


2 


45 


0.515 


8.240 


0,631 




4.824 


17 


2 


48 


0.465 


7.898 


0.61G 




4.333 


18 


2 


51 


0.419 


7.548 


0.587 




3,895 


19 


2 


54 


0.379 


7.192 


0.564 




3,500 


20 


2 


57 


0,341 


6.829 


0.540 




3,143 


21 


2 


60 


0.308 


6.462 


0,515 




2,818 


22 


2 


63 


0.277 


6.089 


0.490 




2,522 


23 


2 


66 


0.248 


5.713 


0.464 


( 


2,250 


24 


2 


69 


0.222 


5.333 


0.437 




2,000 


25 


2 


72 


0.198 


4.950 


0.410 




1,769 


26 


2 


75 


0.176 


4.565 


0,383 


> ^ 
'V. .* 


1,556 


27 


2 


78 


0.158 


4.177 


0,356 




1,357 


28 


2 


81 


0.135 


3.786 


0,328 




1,172 


29 


2 


84 


0.117 


3,394 


0,300 


4 


1.000 


30 


2 


87 


0.100 


3,000 


0.271 



-B2- 



I ^ 



( 




100 3000 90 10 



10 



N NUl 



NU2 0BPHI*»2 OB LAMDA POWER 



141*000 

91.000 
66*000 
51*000 
41*000 
33*857 
28*500 
24*333 

21.000 
18*273 
16*000 
14*077 
12.429 
11.000 

9*750 
8*647 
7*667 
6.789 
6*000 
5*286 
4*636 
4*043 
. 3* 500 
3*000 
2*538 
2.111 
1*714 
1*345 
1.000 



2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 



2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 



3 

6 

9 

12 

15 

18 

21 

24 

27 

30 

33 

36 

39 

42 

45 

48 

51 

54 

57 

60 

63 

66 

69 

72 

75 

78 

81 

84 

87 



5.875 

4*789 

4*000 

3.400 

2*929 

2*548 

2*235 

1*973 

1*750 

1*558 

1*391 

1*245 

1.115 

1.000 

0.897 

0*803 

0*719 

0*642 

0.571 

0*507 

0.447 

0*392 

0*341 

0*294 

0*250 

0.209 

0.170 

0.134 

O.iOO 



11.750 

14*368 

16*000 

17.000 

17*571 

17.839 

17*882 

17*757 

17*500 

17*140 

16*696 

16*184 

15*615 

15*000 

14.345 

13*656 

12*937 

12*194 

11*429 

10*644 

9.842 

9*025 

8*195 

7*353 

6.50b 

5.637 

4*766 

3.887 

3*000 



0*433 
0*694 
0*862 
0.971 
1*042 
1*086 
1.110 
1*120 
1.118 
1.107 
1*089 
1.065 
1 .036 
1.003 
0*967 
0*927 
0.885 
0.841 
0*^96 
0*748 
0*699 
0*649 
0*598 
0*545 
0.492 
0*438 
0*383 
0*327 
0*271 



-B3-. 




100 3000 



K 

15.000 

10.000 

7.500 

6.000 

5.000 
4.286 
3.750 
3.333 

3.000 
2.727 
2o500 
2.308 
2.143 

2.000 
1.875 
1.765 
1.667 
1.579 

1.500 
1.429 
1.364 
1.304 
1.250 
1.200 
1.154 
i.ixi 

1.071 

1.034 

1.000 



0 100 10 2 



N 


NUl 


NU2 


NU 


OBPHI*»2 


OB LAMOA 


FALPHA 


POWER 


2 


2 


3 


3.453 


1.316 


2.632 


9.558 


0.137 


3 


2 


6 


3.509 


0.917 


2.752 


5.145 


0.193 


4 


2 


9 


3.539 


0.704 


2.817 


4.258 


0.223 


5 


2 


12 


3.558 


0.571 


2.857 


3.885 


0 . 241 


6 


2 


15 


3.571 


0.481 


2.885 


3.682 


0.253 


7 


2 


18 


3.580 


0.415 


2.905 


3.554 


0.262 


8 


2 


21 


3.587 


0.365 


2.920 


3.468 


0.268 


9 


2 


24 


3.593 


0.326 


2.932 


3.403 


0*273 


10 


2 


27 


3.597 


0.294 


2.941 


3.354 


6 . 277 


11 


2 


30 


3 . 601 


0.268 


2.949 


3.315 


0.280 


12 


2 


33 


3.604 


0.246 


2.956 


3.285 


0.283 


13 


2 


36 


3.607 


0.228 


2.961 


3.259 


0.285 


14 


2 


39 


3.609 


0.212 


2.966 


3.238 


0.287 


15 


2 


42 


3.611 


0.198 


2.976 


3.219 


0.289 


16 


2 


45 


3.6i3 


0.186 


2.974 


3.204 


0.290 


17 


2 


48 


3.614 


0.175 


2.977 


3.191 


0.291 


18 


2 


51 


3.6i6 


0.166 


2.980 


3.178 


0.292 


19 


2 


54 


3.617 


6.157 


2.903 


3.168 


0.293 


20 


2 


57 


3 .618 


0.149 


2.985 


3ol58 


0.294 


21 


2 


60 


3.619 


0.142 


2.987 


3.151 


6.295 


22 


2 


63 


3.620 


0.136 


2.989 


3.143 


0.296 


23 


2 


66 


3.621 


0.130 


2.991 


3.136 


0.296 


24 


2 


69 


3.621 


0.125 


?.993 


3.129 ‘ 


0^297 


25 


2 


72 


3.622 


p.120 


2.994 


3.124 


0.298 


26 


2 


75 


3.623 


0.115 


2.995 


3.119 


6.298 


27 


2 


78 


3 . 623 


0.111 


2.997 


3«114 


0.299 


28 


2 


81 


3.624 


0.167 


2.998 


3.109 


0.299 


29 


2 


84 


3.625 


0.103 


2.999 


3.164 


0.300 


30 


2 


87 


3.625 


0.100 


3.060 


3.102 


0.300 



-b4- 




10 



2 



100 3000 80 20 



K 


N 


NUl 


NU2 


NU 


OBPHI**2 


OB LAMDA 


FALPHA 


71.000 


2 


2 


3 


6.230 


4.176 


8.353 


9.558 


46.000 


3 


2 


6 


6.806 


3.172 


9.517 


5.145 


33.500 


4 


2 


9 


7.102 


2.528 


10.113 


4.258 


26.000 


5 


2 


12 


7.244 


2.080 


lGo400 


3«885 


21.000 


6 


2 


15 


7.293 


1.750 


10.500 


3 Q 682 


17.429 


7 


2 


18 


7.283 


1.497 


10.479 


3.554 


14»750 


8 


2 


21 


7.231 


1.297 


10.374 


3.466 


12.667 


9 


2 


24 


7.149 


1.134 


10.209 


3.403 


11.000 


10 


2 


27 


7.045 


X.OOO 


10.000 


3.354 


9.636 


11 


2 


30 


6.925 


0.887 


9.757 


3.315 


8.500 


12 


2 


33 


6.792 


0.791 


9.488 


3.285 


7.538 


13 


2 


36 


6 . 648 


0.708 


9.199 


3.259 


6.714 


14 


2 


39 


6.496 


0.635 


8.892 


3.236 


6.000 


15 


2 


42 


6.338 


0.571 


8.571 


3.219 


5.375 


16 


2 


45 


6.174 


0.515 


8.240 


3.204 


4.824 


17 


2 


48 


6.005 


0.465 


7.898 


3.191 


4.333 


18 


2 


51 


5.833 


0.419 


7.548 


3.178 


3.895 


19 


2 


54 


5.657 


0.379 


7.192 


3.168 


3.500 


20 


2 


57 


5.478 


0.341 


6.829 


3.158 


3.143 


21 


2 


60 


5.298 


0.308 


6.462 


3.151 


2.818 


22 


2 


63 


5.115 


0.277 


6.089 


3.143 


2.522 


23 


2 


66 


4.931 


0.248 


5.713 


3.136 


2.250 


24 


2 


69 


4.746 


0.222 


5.333 


3.129 


2.000 


25 


2 


72 


4.559 


0.198 


4.950 


3,124 


1.769 


26 


2 


75 


4.372 


0.176 


4.565 


3.119 


1.556 


27 


2 


78 


4.185 


0.155 


4.177 


3. 114 


1.357 


28 


2 


81 


3.998 


0.135 


3.786 


3.109 


1.172 


29 


2 


84 


3.811 


0.117 


3.394 


3.104 


1.000 


30 


2 


87 


3.625 


0.100 


3.000 


3.102 



■-B5- 



POWER 

0«329 
0.553 
0.672 
0.726 
0.755 
0.771 
0.779 
0 .782 
0.782 
0.755 
0.750 
0.742 
0.733 
0.722 
Q.709 
0.694 
0.651 
0i634 
0.614 
0.593 
0.570 
0.521 
0.496 
0.469 
0.440 
0.408 
0.365 
0.333 
0.300 





100 3000 90 



10 



10 



2 



K 


N 


NUl 


NU2 


NU 


0BpHI**2 


OB LAMDA 


falpha 


POWER 


1^1.000 


2 


2 


3 


7 


.914 


5*875 


11*750 


9.558 


0*429 


91.000 


3 


2 


6 


9 


.217 


4.789 


14*368 


5.145 


0*745 


66*000 


4 


2 


9 


10 


.029 


4*000 


16*000 


4*238 


6*870 


51*000 


5 


2 


12 


10 


*528 


3*400 


17*000 


3*885 


0*911 


41.000 


6 


2 


15 


10 


*813 


2*929 


17*571 


3*682 


0*936 


33.857 


7 


2 


18 


10 


*946 


2*548 


17*839 


3.554 


0*950 


28.500 


8 


2 


21 


10 


*968 


2.235 


17*882 


3.468 


0*956 


24.333 


9 


2 


24 


10 


*905 


1*973 


17*757 


3.403 


0.959 


21.000 


10 


2 


27 


10 


*777 


1*750 


17*500 


3.354 


0 .960 


18*273 


11 


2 


30 


10 


.597 


1*558 


17*140 


3.315 


0.960 


16*000 


12 


2 


33 


10 


*376 


1*391 


16*696 


3.285 


0*959 


14*077 


13 


2 


36 


10 


*121 


1*245 


16*184 


3.259 


0*957 


12.429 


14 


2 


39 


9 


*838 


1.115 


15*615 


3.238 


6*944 


11.000 


15 


2 


42 


9 


*531 


1*000 


15*000 


3.219 


6*9^9 


9.750 


16 


2 


45 


9 


*205 


0*897 


14*345 


3.204 


0*933 


8.647 


17 


2 


48 


8 


*862 


0*803 


13*656 


3.191 


0*911 


7.667 


18 


2 


51 


8 


*505 


0*719 


12*937 


3.178 


0*900 


6*789 


19 


2 


54 


8 


*135 


0*642 


12*1^4 


3.168 


6*888 


6*000 


20 


2 


57 


7 


*755 


0*571 


11*429 


3.158 


6*852 


5*286 


21 


2 


60 


7 


*365 


0*507 


10*644 


3*151 


6i832 


4*636 


22 


2 


63 


6 


*967 


0*447 


9*842 


3*143 


0*783 


4*043 


23 


2 


66 


6 


*563 


0*392 


9.025 


3*136 


0 754 


3*5b0 


24 


2 


69 


6 


*152 


0*341 


8*195 


3*129 

t 


0;7I9 


3*000 


25 


2 


72 


5 


*736 


0*294 


7*353 


3*124 


0*649 


2*538 


26 


2 


75 


5 


*317 


0*250 


6*500 


3*119 


0*600 


2*111 


27 


2 


78 


4 


*894 


0*209 


5*637 


3*114 


6*519 


1.714 


28 


2 


81 


4 


*470 


6*170 


4*766 


3*109 


0*457 


1*345 


29 


2 


84 


4 


*046 


6.134 


3.887 


3*104 


0 * 384 


1*000 


30 


2 


87 


3 


.625 


0*100 


3*000 


3*102 


0*300 



-b6- 





Pi40eP0br.fu4«) 



INSnUCTiONI FM COKUtlfMO MPOIT S6UMI 



Tb« muiDC is used to idtocify somoury dsta and iofermatioo aboot each docomeof acqoifcdt pioccsiad, 
aod stofcd wiihio tba laMC systein. In addition to sanring as a pcsmanent record of dte doctuntnt H the ccA^ 
lecdoQ« the resume is also a means of disseminatiom All 6elds of die form most be comidMd in d»e allottsd 
spaces* imt inappUctble fields shoold be left blank* The following instractions are Keyed to tl^ line numbers 
spearing in the left margin of the formt 



TOP UNI. BRIC AcoMtion Ktt. Leave blank. A pennanent ED 
number vrlll be aaeifacd lo each resume and its corresponding 
document cs they are processed iaio the ERIC system. 

IMI 001. Ctearia^use Accessioa Kg For use only by ERIC 
CleariadKKiMs. Eater thea^^ mde and doounent 

aumbcf. 

Resume Date. la aumeHc’ ftrm, eater moath, day* and year 
that retuire h completed. (Bua^: 07 H 66) 

£A Leatebla^ 

IrA* leave biaak 

Check appropriaie blodt to denote preseace cf 
copyrigraTtiaserial widiia the document. 

ERIC R ws^Ktiott Rdeaec; Check appropriate blodt to indi- 
cate thaVlXlb haa permisrim to reproduce the document and 
ita resume ftrai. 

IINB I00»I03. Tide. Enter die complete document title, in* 
dudlag sttbtkku if they add sigaillcaat Infermatioo. Where 
appllmible, ako eater voluaae number or part number, and the 
ty^ ofdocumesxi (Fimi Ript/rU /sSona ihrpsri, TAmt, rir./ 

200. FWyinl Audwedi). Eater personal author(s), last 
name 6m. Asr, JsAa If two authors are i^ven, 

enter both. SmUk^ M). Ifthmare 

three or snore autl^ list o^oireftlkmed by others.” 

UM 300. Ins|!tutisa Enter the name of the organisa* 

tioo wUcii ori^aatsd the report. Indude the address (dip eatf 
SUie) and the subtfdieaie unit of the organiaadoa. ( Exiffte: 
Henmri 6kw., GsmVa^r* M§tt.,Sek$9t^E6mmm.) 

Source Ctefe Ltave blank. 

Rsptwy^t^Na Eater any uniq^ie number Ass^ned 
to the document fay tfitc hstitutional source, f Ekespk.- SC-*!236) 

tINB 32fil Other Source* Use only when a second source is 
assodated whfa the datument Folkw iaunictioGs for line 500 
abore. 

Source Code Leave blank. 



UNi 350. Other Report No. Eater document number aaiipied 
by dte second source. 

UNi 340. Other ^Source. Use only when a tldrd source is aiso> 
ciaiedwidi the document. Follow iiutrucdoiute litre 500 above. 

Source Cade. Leave blank. 

UNS 350. Other Report Na Eater document aumbar asdgrred 
b> the third source. 

UKt 400. Ribltcarioo Date. Eater the day, moedi, aad year of 
the document ( Enrn fk: 

Contract/Grant Wmaber. ApplkableonlyferdocumeoitfSo- 
crated fiom research sponsored 1^ the U.S. Oflice of Educcdon. 
Enter iq>propriate contract or grant aumbtr and Its pre fc 
(Bmmpk: OEC-l^^2344m) 

IINES 500«501« Fsginafion, me. Eater the total number of 
pages of the docummt, includiag iOustratioBS* aad appendixes. 
(Bmn^liOp.) USB THIS SPACE FOR ADDITIONAL IN- 
FORMATION PERTINENT TO THE DOCUMENT, such ts 
pubUiher, journal dtatioa, and other coatraa mumbeti. 

UNCS fi00«d06« Retrieval Terms, Eater the important sufaieot 
terms (iksa^tm) which, taken aa a group, adequatdy describe 
the contents of tire docu m en t 

UNC fi07. Idendficft. Eater any additional important terns, 
mere qredfic than desoripcort, su^ aa trade namesi equipmesa 
modd names and numbers, organisation and prejea names, 
discussed in the d i K Ai mfn tf 

UNCS 300*322. Abstrai^ Enter aaiafennative abstract of the 
document. Its style and content must be suitable Ibr public 
und dhseminadcQ. 



w.% w mm a ns n mtirmw ssnee > tMSO-tst«Mi 




(W») 



001 

100 

101 

102 

103 

200 - 

300 

310 

320 

330 

340 

3S0 

400 

500 

501 

600 

601 

602 

603 

604 

605 

606 

607 

800 

801 

802 

803 

804 

805 

806 

807 

808 
ona 

810 

811 

812 

813 

814 

815 

816 

817 

818 

819 

820 
821 
822 



Of 6000 (tEV* M 8 ) oePARTMKNT or HKALTH. KOUCATION.ANO W 8 LPARK 



CRIC ACCKMSON NO. 






omCK OP SDilCATION 

ERIC REPORT RESUME 




ACCKSSION NUUeSR 


09-2T-67 


P.A. 


ilk. 


ERIC NKPNODUCTION RKLCAtCT 


YES Q 

yssSl 



TITUB 



EFFSCT OF EBROR OF HEASURSOT ON 0!HE POWER OF 8XATISTICAL TESTS. 



wo on 
woC 



Final Report 



PCRBONAb AUTHORl»r 




Cleary. T. Anne and Linn. Robert L. 




SNSTlYUtlQN (SOUNCEI 


SOURCE COOS 


Educational Testing Service. Prdnceton. New Jersey 




NS^OIIT/SERIES NO. 


OTmSR BOURCC 


SOURCE COOS 


IMversity of Wisconsin, Madison, Wisconsin 




OtHCN NEf»ONT NO* 


OtHSfl SOURCE 


SOURCE COOS 


other report NO. 


puE*u. DATE 27 ** 67 i contract/orant numssr 0BG*1«>7**06S57^2632 


paoinatiqn. etc. 




1 + 43 pp. 




RSTRIEVAU TERMS % 



Statistics 
Mental Test Theozy 
Pover 

ReUatUlty 
Enror of Measurement 



lOBNTiriBR* 



ABBTRACT 

The purpose of this research vas to stuciy the effect of error of mmsisre* 
sent upon the pover of statistical tests. Attention vas focused on the F test 
of the single factor analysis of variance. Formulas were derived to shov the 
relationship between the noncentrality parameters for analyses using true scores 
and tiiose using observed scores. The effect of the reHabmiy of the measure- 
ment and the sari^e size were thus demonstrated. The assun^lons oT classical 
test theory were used to develop formulas relating test length to the non- 
centrality parameters. 

Three methods of estimating power for different coxidltlons of sanqple size 
and test length were studied. The cost of an ejqperlment was analyzed in terms 
of a fixed cost per subject and a variable cost dependent vf^rs. test length. 
Coeq[mter programs were written to use the least squares appcmlmatlon and the 
approximation based on Patnalk to estimate the power under all permissible 
allocations of resources to sas^le size and test length. The prc^pram results 
Indicate tdilch of the permissible allocations will result in maximum power. 

To demonstrate empirically the effect of error of meaewrement on the 
power of statistical tests, saoqples of persons axid Items w^ randomly drawn 
tvaa. a large pool of data. Tests of 10, 20, and 40 randomly drawn items were 
scored for samples with four and eight persons per group. The ejected trends 
were present but not definitive. ' 



